Introduction

Column

Analyzing Data Science Professions in Business

Introduction

My project will be “Analyzing Data Science Professions in Business.” For this project, I will use a dataset from Kaggle called “Salary of data professions.” Kaggle has pre-processed the data in this dataset to identify salary information about business professionals.

Column

Research Questions

  • research questions
    • What are the highest-paying areas of business in which to work as a data professional?
    • Do men have advantages over women when entering the field of data science?
    • Are there disproportionate pay gaps in data science fields?

Background

Why I chose this topic

I chose this topic because after I graduate, I plan to use my data analytics minor in my career so I can combine my understanding of accounting with data analytics. This is relevant to my life because as a girl with a major that is mostly male-dominated, I am interested to see if that is also the case with my minor.

Research Plan

To answer my research questions , I used data visualization techniques to analyze patterns in the data. These graphs include bar charts, scatter plots, and boxplots used to help identify trends. I used these graphs to identify relationships between variables. I was very interested in discovering how all variables connect and seeing direct and indirect relationships between variables.

Data

Column

First 500 observations

Column

Variables

In order to analyze differences between concentrations and salary amounts of different data professionals, we need to understand what each variable means and how it is relevant to the study.

  • First Name: first name of employee
  • Last Name: last name of employee
  • Sex: sex of employee
  • Date of Join: date the employee started working the job
  • Current Date: current date
  • Designation: rank of the employee
  • Age: age of employee
  • Salary: annual salary of the employee
  • Unit: business unit of the employee
  • Leaves Used: number of leaves used by the employee
  • Leaves Remaining: number of leaves remaining for the customer
  • Ratings: performance ratings of the employee
  • Past Experience: past work experience before joining the current company
Rows: 2,639
Columns: 13
$ FIRST.NAME       <chr> "TOMASA", "ANNIE", "OLIVE", "CHERRY", "LEON", "VICTOR…
$ LAST.NAME        <chr> "ARMEN", "", "ANCY", "AQUILAR", "ABOULAHOUD", "", "AG…
$ SEX              <chr> "F", "F", "F", "F", "M", "F", "M", "M", "F", "F", "F"…
$ DOJ              <chr> "5-18-2014", "", "7-28-2014", "04-03-2013", "11-20-20…
$ CURRENT.DATE     <chr> "01-07-2016", "01-07-2016", "01-07-2016", "01-07-2016…
$ DESIGNATION      <chr> "Analyst", "Associate", "Analyst", "Analyst", "Analys…
$ AGE              <int> 21, NA, 21, 22, NA, 22, 22, NA, 28, 22, 24, 36, 24, 2…
$ SALARY           <int> 44570, 89207, 40955, 45550, 43161, 48736, 40339, 4005…
$ UNIT             <chr> "Finance", "Web", "Finance", "IT", "Operations", "Mar…
$ LEAVES.USED      <int> 24, NA, 23, 22, 27, 20, 19, 29, 20, 15, 22, NA, 15, 2…
$ LEAVES.REMAINING <int> 6, 13, 7, 8, 3, 10, 11, 1, 10, 15, 8, 11, 15, 7, 0, N…
$ RATINGS          <int> 2, NA, 3, 3, NA, 4, 5, 2, 3, 3, 4, 2, 5, 2, 4, 5, 3, …
$ PAST.EXP         <int> 0, 7, 0, 0, 3, 0, 0, 2, 1, 0, 1, 9, 1, 0, 1, 0, 2, 1,…

Business Unit Frequencies

Column

Most common business unit

  • Before I started answering my research questions, I first wanted to see the most common business segments to work in as a data analyst.

Analysis

  • According to this bar plot, the business unit that has the most data professionals is IT.
  • It is surprising to see that the web segment is one of the segments with the fewest data professionals.
  • The business segment with the fewest amount of data professionals is management.

Column

Bar Chart

Highest-paying business units

Column

Highest-paying business units

  • My first research question is: What are the highest-paying areas of business in which to work as a data professional?
  • To answer this question, I created side-by-side box plots of senior analyst salaries in every business unit.

Analysis

  • The side-by-side box plots show that the business units with the highest median salaries are web and IT. This means that the majority of senior analyst IT and web salaries are higher than other business professions.
  • The business segment with the widest range of salaries is operations. This means that senior analyst salaries in operations have more variability compared to other segments.
  • The box plot shows the segments with the lowest medians are marketing and management. This means that senior analyst salaries in marketing and management are generally lower than other business professions.

Column

Salary by gender in business units

Column

Salary differences by gender

  • My third research question is: Are there disproportionate pay gaps in data science fields?
  • In order to compare salary differences of men versus women based on business unit, I first needed to filter the dataset into smaller datasets.
  • It is important to filter the dataset so that every employee in the dataset is from the same business unit and designation. This minimizes the effects on outliers on the graphs.
  • After I filtered the dataset into smaller ones, I created boxplots to compare the salaries of entry-level female data analysts and entry-level male data analysts.

Analysis

  • The business units with the largest differences in entry-level salaries between men and women are marketing and operations. The graphs show that men make significantly more than women while working as data analysts in the marketing field.
  • In contrast, the graphs also show that women make more than men while working in operations as a data analyst.
  • While there are small salary differences between men and women in the other graphs, they are not significant enough to conclude that the differences in salaries are due to gender.

Column

Designation by gender in business units

Column

Frequency of designation by gender

  • My 2nd research question is: Do men have advantages over women when in the field of data science?
  • In order to understand if one men have advantages over women in data science professions, it is important to see how many women there are versus men in high-ranking analyst positions and regular positions.
  • To compare the number of men vs. women in senior analyst positions and regualr analyst positions, I filtered the dataset into a smaller datasets containing only senior analysts and regular analysts. These box plot show the frequency of men vs. women in senior analyst positions and regular positions.

Analysis

  • The first bar graph shows that there are more men than women in senior analyst postions. However, the second graph shows that there are more men than women as regular analysts. It can be assumed from these two graphs that it is harder for women to get promotions to senior analyst positions, even if it is not as hard to be hired as a regular analyst.
  • Even though there are more women than men in regualar positons, the fact that there are more men in senior postions puts women at a disadvantage because they are underrepresented in the leadership of their company.

Column

Years of experience & salary

Column

Analysts with zero experience

  • My second research question is: Do men have advantages over women when entering the field of data science?
  • In order to answer the research question if men have advantages over women when entering the data science field, it is necessary to understand if men have an easier time getting hired into data positions.
  • In this graph, I filtered the dataset into a smaller one that only contains entry-level analysts with zero experience working in the IT unit. This bar chart shows the frequency of men vs. women getting hired into analyst roles with no experience behind them

Analysis

  • This bar chart clearly shows that men are more likely to be hired as a data analyst with no experience behind them.
  • If men are more likely to be hired simply because they are men, women are at a disadvantage when looking for a job in the data science field.

Column

Other relevant graphs

Column

Other relevant variables

  • There are various other different variables that contribute to salary differences between employees. The two other most relevant variables are age and years of experience.
  • It is important to understand these variables’ effects on salary as well. These two scatterplots show the relationships of age and years of experience with salary amounts.

Analysis

  • Both of these scatterplots clearly show strong, positive relationships between the variables.
  • The older you are and the more experience you have, the higher your salary will be as a data analyst.

Column

Conclusion

Conclusion

After doing various types of analysis to this dataset, I have come to various conclusions about my research questions:

1. What are the most common and highest-paying areas of business in which to work as a data professional?

The most common field to work in as a data analyst is IT. This was shown in the first bar graph I made. The highest-paying business units are IT and operations. Senior Analysts that have many years of experience in data analysis typically earn the most in operation and IT, as seen by the side-by-side box plots shown earlier.

2. Do men have certain advantages over women in data science professions?

Overall, men with no data analysis experience are more likely to be hired over women in entry-level data analysis positions. Additionally, men are also more likely to be promoted to senior analyst positions. This was shown in the bar graphs I made depicting the number of senior analysts based on sex the number of analysts hired with zero years of experience.

3. Are women disproportionately paid in data science fields?

According to the graphs I made depicting salary differences between men and women in different business units, the only business units in which there is a significant difference between the way women and men are paid are marketing and operations. In the marketing field, men are paid significantly more. Additionally, women are generally paid more than men in the operations field. This shows that the differences in the way men and women are paid depend on the business unit they are working in, because it can go both ways.

---
title: "Analyzing Data Science Professions"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: minty
      #navbar-bg: "purple"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(DT) #needed with datatable()
library(tidyverse)
library(plotly) #needed with ggplotly()
salary<-read.csv("./salaryprediction.csv")
```

Introduction
===

Column {data-width=550}
---
#### Analyzing Data Science Professions in Business 

#### Introduction

My project will be “Analyzing Data Science Professions in Business.” For this project, I will use a dataset from Kaggle called “Salary of data professions.” Kaggle has pre-processed the data in this dataset to identify salary information about business professionals. 

Column {.tabset data-width=450}
---

### Research Questions
- research questions
  - What are the highest-paying areas of business in which to work as a data professional?
  - Do men have advantages over women when entering the field of data science?
  - Are there disproportionate pay gaps in data science fields?

Background
===


#### Why I chose this topic

I chose this topic because after I graduate, I plan to use my data analytics minor in my career so I can combine my understanding of accounting with data analytics. This is relevant to my life because as a girl with a major that is mostly male-dominated, I am interested to see if that is also the case with my minor. 

#### Research Plan 

To answer my research questions , I used data visualization techniques to analyze patterns in the data. These graphs include bar charts, scatter plots, and boxplots used to help identify trends. I used these graphs to identify relationships between variables. I was very interested in discovering how all variables connect and seeing direct and indirect relationships between variables.

  

Data
===

Column {data-width=550}
---

### <b><font size = 4><span Style = "color:blue">First 500 observations</span></font></b>

```{r show_table}
datatable(salary[1:500,],rownames=FALSE,colnames=c("first name","last name","sex","date of join","current date","designation","age","salary","unit","leaves used","leaves remaining","ratings","past experience"),options=list(pageLength=20))
```

Column {data-width=450}
---

### <font size = 4><span Style = "color:red">Variables</span></font>

In order to analyze differences between concentrations and salary amounts of different data professionals, we need to understand what each variable means and how it is relevant to the study.  

- First Name: first name of employee
- Last Name: last name of employee
- Sex: sex of employee 
- Date of Join: date the employee started working the job
- Current Date: current date 
- Designation: rank of the employee
- Age: age of employee
- Salary: annual salary of the employee 
- Unit: business unit of the employee
- Leaves Used: number of leaves used by the employee
- Leaves Remaining: number of leaves remaining for the customer
- Ratings: performance ratings of the employee
- Past Experience: past work experience before joining the current company 

```{r}
glimpse(salary)
```


Business Unit Frequencies{data-orientation=columns}
===

Column {data-width=500}
---

#### Most common business unit 

- Before I started answering my research questions, I first wanted to see the most common business segments to work in as a data analyst.

#### Analysis

- According to this bar plot, the business unit that has the most data professionals is IT. 
- It is surprising to see that the web segment is one of the segments with the fewest data professionals. 
- The business segment with the fewest amount of data professionals is management. 

Column {data-width=500}
---

### **Bar Chart**
```{r bar1}
par(mpg=c(4,1,0))
par(mar=c(5,7,4,2))
unit_bar_plot<-ggplot(salary,aes(x=UNIT))+
  geom_bar()+
  labs(title="salary of data professionals",x="business unit",y="number of professionals")
unit_bar_plot
```


Highest-paying business units {data-orientation=columns}
===

Column {data-width=500}
---

#### Highest-paying business units

- My first research question is: What are the highest-paying areas of business in which to work as a data professional? 
- To answer this question, I created side-by-side box plots of senior analyst salaries in every business unit. 


#### Analysis

- The side-by-side box plots  show that the business units with the highest median salaries are web and IT. This means that the majority of senior analyst IT  and web salaries are higher than other business professions. 
- The business segment with the widest range of salaries is operations. This means that senior analyst salaries in operations have more variability compared to other segments. 
- The box plot shows the segments with the lowest medians are marketing and management. This means that senior analyst salaries in marketing and management are generally lower than other business professions.  

Column {data-width=500}
---

```{r}
analystsalary<-salary%>%filter(DESIGNATION=="Senior Analyst")
unit_salary<-ggplot(analystsalary,aes(x=UNIT,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of salary by unit",x="unit",y="salary")
unit_salary
```



Salary by gender in business units{data-orientation=columns}
===

Column {data-width=500}
---


#### Salary differences by gender

- My third research question is: Are there disproportionate pay gaps in data science fields?
- In order to compare salary differences of men versus women based on business unit, I first needed to filter the dataset into smaller datasets. 
- It is important to filter the dataset so that every employee in the dataset is from the same business unit and designation. This minimizes the effects on outliers on the graphs.
- After I filtered the dataset into smaller ones, I  created boxplots to compare the salaries of entry-level female data analysts and entry-level male data analysts.



```{r}
finance<-salary%>%filter(UNIT=="Finance")%>%filter(DESIGNATION=="Analyst")
marketing<-salary%>%filter(UNIT=="Marketing")%>%filter(DESIGNATION=="Analyst")
management<-salary%>%filter(UNIT=="Management")%>%filter(DESIGNATION=="Analyst")
operations<-salary%>%filter(UNIT=="Operations")%>%filter(DESIGNATION=="Analyst")
web<-salary%>%filter(UNIT=="Web")%>%filter(DESIGNATION=="Analyst")
IT<-salary%>%filter(UNIT=="IT")%>%filter(DESIGNATION=="Analyst")
```


#### Analysis

- The business units with the largest differences in entry-level salaries between men and women are marketing and operations. The graphs show that men make significantly more than women while working as data analysts in the marketing field. 
- In contrast, the graphs also show that women make more than men while working in operations as a data analyst. 
- While there are small salary differences between men and women in the other graphs, they are not significant enough to conclude that the differences in salaries are due to gender. 


Column {data-width=500}
---


```{r}
gender_finance<-ggplot(finance, aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+ labs(title = "Boxplot of finance salary by gender",x="gender",y="salary")
gender_finance

gender_marketing<-ggplot(marketing,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of marketing salary by gender",x="gender",y="salary")
gender_marketing

gender_IT<-ggplot(IT,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of IT salary by gender",x="gender",y="salary")
gender_IT

gender_management<-ggplot(management,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of management salary by gender",x="gender",y="salary")
gender_management

gender_operations<-ggplot(operations,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of operations salary by gender",x="gender",y="salary")
gender_operations

gender_web<-ggplot(web,aes(x=SEX,y=SALARY))+geom_boxplot(fill="blue",color="darkblue")+labs(title="boxplot of web salary by gender",x="gender",y="salary")
gender_web
```

Designation by gender in business units{data-orientation=columns}
===

Column {data-width=500}
---

#### Frequency of designation by gender
- My 2nd research question is: Do men have advantages over women when in the field of data science?
- In order to understand  if one men have advantages over women in data science professions, it is important to see how many women there are versus men in high-ranking analyst positions and regular positions.
- To compare the number of men vs. women in senior analyst positions and regualr analyst positions, I filtered the dataset into a smaller datasets containing only senior analysts and regular analysts. These box plot show the frequency of men vs. women in senior analyst positions and regular positions.



#### Analysis

- The first bar graph shows that there are more men than women in senior analyst postions. However, the second graph shows that there are more men than women as regular analysts. It can be assumed from these two graphs that it is harder for women to get promotions to senior analyst positions, even if it is not as hard to be hired as a regular analyst. 
- Even though there are more women than men in regualar positons, the fact that there are more men in senior postions puts women at a disadvantage because they are underrepresented in the leadership of their company. 

Column {data-width=500}
---
```{r}
senioranalysts<-salary%>%filter(DESIGNATION=="Senior Analyst")
senior_bar_plot<-ggplot(senioranalysts,aes(x=SEX))+
  geom_bar()+
  labs(title="gender of senior analysts",x="gender",y="number of professionals")
senior_bar_plot
entryanalysts<-salary%>%filter(DESIGNATION=="Analyst")
entry_bar_plot<-ggplot(entryanalysts,aes(x=SEX))+
  geom_bar()+
  labs(title="gender of regular analysts",x="gender",y="number of professionals")
entry_bar_plot
```

Years of experience & salary {data-orientation=columns}
===

Column {data-width=450}
---

#### Analysts with zero experience

- My second research question is: Do men have advantages over women when entering the field of data science?
- In order to answer the research question if men have advantages over women when entering the data science field, it is necessary to understand if men have an easier time getting hired into data positions. 
- In this graph, I filtered the dataset into a smaller one that only contains entry-level analysts with zero experience working in the IT unit. This bar chart shows the frequency of men vs. women getting hired into analyst roles with no experience behind them 


#### Analysis 

- This bar chart clearly shows that men are more likely to be hired as a data analyst with no experience behind them. 
- If men are more likely to be hired simply because they are men, women are at a disadvantage when looking for a job in the data science field. 

Column {data-width=450}
---
```{r}
zeroexp<-salary%>%filter(PAST.EXP==0)%>%filter(DESIGNATION=="Analyst")%>%filter(UNIT=="IT")
zero_exp_bar_plot<-ggplot(zeroexp,aes(x=SEX))+
  geom_bar()+
  labs(title="gender of regular analysts with no experience",x="gender",y="number of professionals")
zero_exp_bar_plot
```


Other relevant graphs{data-orientation=columns}
===

Column {data-width=450}
---

#### Other relevant variables
- There are various other different variables that contribute to salary differences between employees. The two other most relevant variables are age and years of experience. 
- It is important to understand these variables' effects on salary as well. These two scatterplots show the relationships of age and years of experience with salary amounts. 

#### Analysis 

- Both of these scatterplots clearly show strong, positive relationships between the variables.
- The older you are and the more experience you have, the higher your salary will be as a data analyst. 

Column {data-width=450}
---

```{r}
age_salary_scatterplot<-ggplot(salary,aes(x=AGE,y=SALARY))+geom_point(color="blue")+geom_smooth(color="red")+theme(text = element_text(size=20))
age_salary_scatterplot

experience_salary_scatterplot<-ggplot(salary,aes(x=PAST.EXP,y=SALARY))+geom_point(color="blue")+geom_smooth(color="red")+theme(text = element_text(size=20))
experience_salary_scatterplot
```


Conclusion 
===

#### Conclusion

After doing various types of analysis to this dataset, I have come to various conclusions about my research questions:

#### 1. What are the most common and highest-paying areas of business in which to work as a data professional?

The most common field to work in as a data analyst is IT. This was shown in the first bar graph I made. The highest-paying business units are IT and operations. Senior Analysts that have many years of experience in data analysis typically earn the most in operation and IT, as seen by the side-by-side box plots shown earlier. 

#### 2. Do men have certain advantages over women in data science professions?
Overall, men with no data analysis experience are more likely to be hired over women in entry-level data analysis positions. Additionally, men are also more likely to be promoted to senior analyst positions. This was shown in the bar graphs I made depicting the number of senior analysts based on sex the number of analysts hired with zero years of experience.

#### 3. Are women disproportionately paid in data science fields?

According to the graphs I made depicting salary differences between men and women in different business units, the only business units in which there is a significant difference between the way women and men are paid are marketing and operations. In the marketing field, men are paid significantly more. Additionally, women are generally paid more than men in the operations field. This shows that the differences in the way men and women are paid depend on the business unit they are working in, because it can go both ways.